摘要 :
Associative classification is a supervised classification method. Many experimental studies have shown that associative classification is a promising approach. However, the latter suffer from a major drawback: the huge number of t...
展开
Associative classification is a supervised classification method. Many experimental studies have shown that associative classification is a promising approach. However, the latter suffer from a major drawback: the huge number of the generated classification rules which takes efforts to select the best ones in order to construct the classifier. To overcome such drawback, we propose in this paper a new direct associative classification method called IGARC, an improvement of GARC approach that extracts directly generic associative classification rules from a training set in order to reduce the number of associative classification rules without jeopardising the classification accuracy. Moreover, we propose an algorithm called PN-GARC that deals with negative classification rules. Considering negated items in classification framework provides additional information describing the data and reduces the conflicts while classifying new objects. Nevertheless, there are a sheer number of rules when considering negated items. That is why, we will explore generic classification rules both negative and positive ones in order to study their behaviour and their usefulness on the studied datasets. A detailed description of IGARC method is presented, as well as the experimentation study on 12 benchmark datasets proving that it is highly competitive in terms of accuracy in comparison with popular classification approaches.
收起
摘要 :
One major goal for data mining is to understand data. Rule based methods are better than other methods in making mining results comprehensible. However, current rule based classifiers make use of a small number of rules and a defa...
展开
One major goal for data mining is to understand data. Rule based methods are better than other methods in making mining results comprehensible. However, current rule based classifiers make use of a small number of rules and a default prediction to build a concise predictive model. This reduces the explanatory ability of the rule based classifier. In this paper, we propose to use multiple and negative target rules to improve explanatory ability of rule based classifiers. We show experimentally that this understandability is not at the cost of accuracy of rule based classifiers.
收起
摘要 :
Traditional classification rules take the positive form as C→D. A new method of retrieving the negative ?C→?D form is introduced in this paper. Negative rules can improve the classification quality in some case. We propose a cla...
展开
Traditional classification rules take the positive form as C→D. A new method of retrieving the negative ?C→?D form is introduced in this paper. Negative rules can improve the classification quality in some case. We propose a classification algorithm named Rule Generation based on Classification Attribute (RGCA) to deduct negative and positive rules. The RGCA algorithm won't need processing records item by item. The real dataset are used to verify the presented algorithm. The result shows the negative rules is more than positive rules based on RGCA algorithm, the classification accuracy of RGCA algorithm is better than traditional positive based algorithm.
收起
摘要 :
Association rules (ARs) have been applied to classification and variable selection. However, currently, only positive ARs are used for variable selection, while only special forms of positive and negative association rules (PNARs)...
展开
Association rules (ARs) have been applied to classification and variable selection. However, currently, only positive ARs are used for variable selection, while only special forms of positive and negative association rules (PNARs) are used for classification. The purpose of this work was to investigate variable selection and classification methods by mining another, more general form of PNARs, one that is more suitable for binary classification and variable selection problems. The algorithm for mining such PNARs exploits the downward closure property of negative itemsets. It is built based solely on items in a transactional database and on equivalence classes under the support-confidence framework. The algorithm combines the process of mining frequent itemsets and rule generation and is both sound and complete. Experimental results on 10 binary datasets of the variable selection and classification methods using the PNARs mined by the proposed algorithm show that these methods are superior to variable selection methods that use the mutual information measure and the chi-squared test and 10 popular classification algorithms, respectively.
收起
摘要 :
The paper proposes phrase-level emotion patterns using Neuro-Fuzzy model. At the initial stage, the emotional patterns at phrase level are obtained using POS Tags and EMOT Actifiers that results into 16 patterns. These patterns wo...
展开
The paper proposes phrase-level emotion patterns using Neuro-Fuzzy model. At the initial stage, the emotional patterns at phrase level are obtained using POS Tags and EMOT Actifiers that results into 16 patterns. These patterns works well with the sentences having single emotion and classifies them into Positive and Negative polarities. However, it is observed that these patterns are unable to define the exact boundary between positive and negative polarities of these sentence patterns. Thus, this issue will affect the classification accuracy due to imprecise boundary between the sentences. Mixed emotions exist in long sentences with multi phrases and therefore the sentences are broken at Phrase-level. The patterns are extracted at phrase-level and converted as fuzzy rules for the classification of mixed emotion patterns. Intensity grades are calculated for the patterns based on the features of phrases and their structure in the sentence. These intensity grades classify the patterns at phrase level into Positive and Negative emotions. Based on the intensity grades, a suitable weighing mechanism is proposed for the multi phrasal sentence structure which decides the degree of Positive and Negative polarities of emotion in a sentence. Higher weighted phrasal pattern decides the Positive and Negative polarities of emotion in a sentence. Proposed approach performs well and achieves good F-Scores compared with other comparative approaches on benchmark datasets.
收起
摘要 :
We often use the positive fuzzy rules only for image classification in traditional image classification systems, ignoring the useful negative classification information. Thanh Minh Nguyen and QMJonathan Wu introduced the negative ...
展开
We often use the positive fuzzy rules only for image classification in traditional image classification systems, ignoring the useful negative classification information. Thanh Minh Nguyen and QMJonathan Wu introduced the negative fuzzy rules into the image classification, and proposed combination of positive and negative fuzzy rules to form the positive and negative fuzzy rule system, and then applied it to remote sensing image/natural image classification. Their experiments demonstrated that their proposed method has achieved promising results. However, since their method was realized using the feedforward neural network model which requires adjusting the weights in the gradient descent way, the training speed is very slow. Extreme learning machine (ELM) is a single hidden layer feedforward neural network (SLFNs) learning algorithm, which has distinctive advantages such as quick learning, good generalization performance. In this paper, the equivalence between ELM and the positive and negative fuzzy rule system is revealed, so ELM can be naturally used for training the positive and negative fuzzy rule system quickly for image classification. Our experimental results indicate this claim.
收起
摘要 :
We study the localization prediction of membrane proteins for two families of medically important disease-causing bacteria, called gram-negative and gram-positive bacteria. Each such bacterium has its cell surrounded by several la...
展开
We study the localization prediction of membrane proteins for two families of medically important disease-causing bacteria, called gram-negative and gram-positive bacteria. Each such bacterium has its cell surrounded by several layers of membranes. Identifying where proteins are located in a bacterial cell is of primary research interest for antibiotic and vaccine drug design. This problem has three requirements: First, with any subsequence of amino acid residues being potentially a dimension, it has an extremely high dimensionality, few being irrelevant. Second, the prediction of a target localization site must have a high precision in order to be useful to biologists, i.e., at least 90 percent or even 95 percent, while recall is as high as possible. Achieving such a precision is made harder by the fact that target sequences are often much fewer than background sequences. Third, the rationale of prediction should be understandable to biologists for taking actions. Meeting all these requirements presents a significant challenge in that a high dimensionality requires a complex model that is often hard to understand. The support vector machine (SVM) model has an outstanding performance in a high-dimensional space, therefore, it addresses the first two requirements. However, the SVM model involves many features in a single kernel function, therefore, it does not address the third requirement. We address all three requirements by integrating the SVM model with a rule-based model, where the understandable if-then rules capture "major structures" and the elaborated SVM model captures "subtle structures". Importantly, the integrated model preserves the precision/ recall performance of SVM and, at the same time, exposes major structures in a form understandable to the human user. We focus on searching for high quality rules and partitioning the prediction between rules and SVM so as to achieve these properties. We evaluate our method on several membrane localization problems. The purpose of this paper is not improving the precision/recall of SVM, but is manifesting the rationale of a SVM classifier through partitioning the classification between if-then rules and the SVM classifier and preserving the precision/recall of SVM.
收起
摘要 :
Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an au...
展开
Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliabe prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called "Gneg-PLoc", is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc.
收起